randomized clinical trial
Evaluating LLMs in Medicine: A Call for Rigor, Transparency
Alwakeel, Mahmoud, Nagori, Aditya, Krishnamoorthy, Vijay, Kamaleswaran, Rishikesan
Objectives: To evaluate the current limitations of large language models (LLMs) in medical question answering, focusing on the quality of datasets used for their evaluation. Materials and Methods: Widely-used benchmark datasets, including MedQA, MedMCQA, PubMedQA, and MMLU, were reviewed for their rigor, transparency, and relevance to clinical scenarios. Alternatives, such as challenge questions in medical journals, were also analyzed to identify their potential as unbiased evaluation tools. Results: Most existing datasets lack clinical realism, transparency, and robust validation processes. Publicly available challenge questions offer some benefits but are limited by their small size, narrow scope, and exposure to LLM training. These gaps highlight the need for secure, comprehensive, and representative datasets. Conclusion: A standardized framework is critical for evaluating LLMs in medicine. Collaborative efforts among institutions and policymakers are needed to ensure datasets and methodologies are rigorous, unbiased, and reflective of clinical complexities.
Two-Stage Penalized Regression Screening to Detect Biomarker-Treatment Interactions in Randomized Clinical Trials
Wang, Jixiong, Patel, Ashish, Wason, James M. S., Newcombe, Paul J.
High-dimensional biomarkers such as genomics are increasingly being measured in randomized clinical trials. Consequently, there is a growing interest in developing methods that improve the power to detect biomarker-treatment interactions. We adapt recently proposed two-stage interaction detecting procedures in the setting of randomized clinical trials. We also propose a new stage 1 multivariate screening strategy using ridge regression to account for correlations among biomarkers. For this multivariate screening, we prove the asymptotic between-stage independence, required for family-wise error rate control, under biomarker-treatment independence. Simulation results show that in various scenarios, the ridge regression screening procedure can provide substantially greater power than the traditional one-biomarker-at-a-time screening procedure in highly correlated data. We also exemplify our approach in two real clinical trial data applications.
Glaucoma: Building a New Future with AI
Artificial intelligence (AI), "big data," block-chain and edge computing are all ways of collecting, storing and analyzing information that, if leveraged effectively, have the potential to rapidly speed up progress in healthcare. AI, with its ability to discern patterns, correlations and trends in huge volumes of data, is among a swathe of new technologies that experts expect to have deep, revolutionary impacts across numerous sectors. Already, ophthalmologists have shown that AI algorithms can provide objective metrics, from simple photographs and optical coherence tomography (OCT), as well as quantify the amount of optic nerve damage in glaucoma. Given the speed at which AI is able to accurately work, many experts predict it will help alleviate time and resource pressures against the backdrop of an aging population โ a particularly pertinent issue, given the shortage of ophthalmologists. Though research and integration of AI in healthcare is ongoing, it has the potential to transform a number of areas โ including processing and analyzing biomedical, clinical and patient data; medical imaging and diagnostics; drug discovery; biomarker research; personal AI assistants; and genomics. There is also a wealth of AI research underway in retinal disease, notably the Moorfields and Deepmind collaboration โ a project that is investigating the use of AI to read complex eye scans and detect more than 50 eye conditions, and identify patients who require urgent treatment. Anthony Khawaja, a Consultant Ophthalmologist at Moorfields Eye Hospital explains how the project came about.
Challenges to the Reproducibility of Machine Learning Models in Health Care - Docwire News
Reproducibility has been an important and intensely debated topic in science and medicine for the past few decades.1 As the scientific enterprise has grown in scope and complexity, concerns regarding how well new findings can be reproduced and validated across different scientific teams and study populations have emerged. In some instances,2 the failure to replicate numerous previous studies has added to the growing concern that science and biomedicine may be in the midst of a "reproducibility crisis." Against this backdrop, high-capacity machine learning models are beginning to demonstrate early successes in clinical applications,3 and some have received approval from the US Food and Drug Administration. This new class of clinical prediction tools presents unique challenges and obstacles to reproducibility, which must be carefully considered to ensure that these techniques are valid and deployed safely and effectively.
Automating Meta-Analyses of Randomized Clinical Trials: A First Look
Michelson, Matthew (InferLink)
A "meta-study" or "meta-analysis" analyzes multiple medical studies related to the same disease, treatment protocol, and outcome measurement to identify if there is an overall effect or not (e.g., treatment induces remission or causes adverse effects). It's advantage lies in the pooling and analysis of results across independent studies, which increases the population size, mitigates some experimental bias or inconsistent results from a single study, etc. Meta-studies are important for understanding the effectiveness (or not) of treatment, influencing clinical guidelines and for spurring new research directions. However, meta-studies are extremely time consuming to construct by hand and keep updated with the latest results. This limits both their breadth of coverage (since researchers will only invest the time for diseases they are interested in) and their practically. Yet, high-quality medical research is increasing at a staggering rate, and there is an opportunity to apply automation to this increasing body of knowledge, thereby expanding the benefits of meta-studies to (theoretically) all diseases and treatment, as they are published. That is, we envision, long term an automatic process for creating meta-studies across all diseases and treatments, and keeping those meta-studies up-to-date automatically. In this paper we demonstrate that there is potential to perform this task, point out future research directions to make this so, and, hopefully, spur significant interest in this compelling and important research direction at the intersection of medical research and machine learning.